Sampling Methods for Ilp
نویسنده
چکیده
This paper is concerned with problems that arise when submitting large quantities of data to analysis by an Inductive Logic Programming (ILP) system. Complexity arguments usually make it prohibitive to analyse such datasets in their entirety. We examine two schemes that allow an ILP system to construct theories by sampling from this large pool of data. The rst, \subsampling", is a single-sample design in which the utility of a potential rule is evaluated on a randomly selected sub-sample of the data. The second, \logical windowing", is multiple-sample design that tests and sequentially includes errors made by a partially correct theory. Both schemes are derived from techniques developed to enable propositional learning methods (like decision trees) to cope with large datasets. The ILP system CProgol, equipped with each of these methods, is used to construct theories for two datasets { one artiicial (a chess endgame) and the other naturally occurring (a language tagging problem). In each case, we ask the following questions of CProgol equipped with sampling: (1) Is its theory comparable in predictive accuracy to that obtained if all the data were used (that is, no sampling was employed)?; and (2) Is its theory constructed in less time than the one obtained with all the data? For the problems considered, the answers to these questions is \yes". This suggests that an ILP program equipped with an appropriate sampling method could begin to address problems satisfactorily that have hitherto been inaccessible simply due to data extent.
منابع مشابه
بررسی تأثیر نرم کننده پرانرژی مایع یونی بر پایه ایمیدازولیوم بر خواص حرارتی نیتروسلولز
In this paper investigates an energetic imidazolium ionic liquid plasticizer (ILP) effect on the degradation kinetics of nitrocellulose, which is a important component of double based solid propellants. For better comparison and evaluation, diethyl phthalate (DEP) plasticizer, which has a structure similar to ILP, was also evaluated. Heat of combustion analysis was performed to evaluate the en...
متن کاملA New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm
This paper presents a novel, integer-linear programming (ILP) model for an identical parallel-machine scheduling problem with family setup times that minimizes the total weighted flow time (TWFT). Some researchers have addressed parallel-machine scheduling problems in the literature over the last three decades. However, the existing studies have been limited to the research of independent jobs,...
متن کاملNeuro-Symbolic EDA-Based Optimization Using ILP-Enhanced DBNs
We investigate solving discrete optimization problems using the ‘estimation of distribution’ (EDA) approach via a novel combination of deep belief networks (DBN) and inductive logic programming (ILP). While DBNs are used to learn the structure of successively ‘better’ feasible solutions, ILP enables the incorporation of domain-based background knowledge related to the goodness of solutions. Rec...
متن کاملStochastic Refinement
The research presented in this paper is motivated by the following question. How can the generality order of clauses and the relevant concepts such as refinement be adapted to be used in a stochastic search? To address this question we introduce the concept of stochastic refinement operators and adapt a framework, called stochastic refinement search. In this paper we introduce stochastic refine...
متن کاملناحیه جواب جدید برای حل مدل برنامه ریزی خطی بازه ای
We consider interval linear programming (ILP) problems in the current paper. Best-worst case (BWC) is one of the methods for solving ILP models. BWC determines the values of the target function, but some of the solutions obtained through BWC may result in an infeasible space. To guarantee that solution is completely feasible (i.e. avoid constraints violation), improved two-step method (ITSM) ...
متن کامل